Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Document Reconstruction by Layout Analysis of Snippets

Identifieur interne : 000786 ( Main/Exploration ); précédent : 000785; suivant : 000787

Document Reconstruction by Layout Analysis of Snippets

Auteurs : Florian Kleber ; Markus Diem ; Robert Sablatnig [Autriche]

Source :

RBID : Pascal:10-0398506

Descripteurs français

English descriptors

Abstract

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Document Reconstruction by Layout Analysis of Snippets</title>
<author>
<name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</author>
<author>
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
</author>
<author>
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institute of Computer Aided Automation, Vienna University of Technology, Favoritenstr. 9</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0398506</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0398506 INIST</idno>
<idno type="RBID">Pascal:10-0398506</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000171</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000606</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000160</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Kleber F:document:reconstruction:by</idno>
<idno type="wicri:Area/Main/Merge">000791</idno>
<idno type="wicri:Area/Main/Curation">000786</idno>
<idno type="wicri:Area/Main/Exploration">000786</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Document Reconstruction by Layout Analysis of Snippets</title>
<author>
<name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</author>
<author>
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
</author>
<author>
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institute of Computer Aided Automation, Vienna University of Technology, Favoritenstr. 9</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Image analysis</term>
<term>Imagery</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse image</term>
<term>Imagerie</term>
<term>Algorithme</term>
<term>0130C</term>
<term>4230</term>
<term>Réserve</term>
<term>Méthode</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Autriche</li>
</country>
<region>
<li>Vienne (Autriche)</li>
</region>
<settlement>
<li>Vienne (Autriche)</li>
</settlement>
</list>
<tree>
<noCountry>
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
<name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</noCountry>
<country name="Autriche">
<region name="Vienne (Autriche)">
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000786 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000786 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0398506
   |texte=   Document Reconstruction by Layout Analysis of Snippets
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024